W542-W546 Nucleic Acids Research, 2012, Vol. 40, Web Server issue Published online 8 May 2012 

doi:10.1093/nar/gks373 

RPF: a quality assessment tool for protein 
NMR structures 

Yuanpeng Janet Huang 1 , Antonio Rosato 2 , Gautam Singh 1 and 
Gaetano T. Montelione 1 ' 3 * 

1 Center for Advanced Biotechnology and Medicine, Northeast Structural Genomic Consortium, Rutgers 
University, 679 Hoes Lane, Piscataway, NJ 08854, USA, 2 Magnetic Resonance Center and 
Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy and 3 Robert Wood Johnson 
Medical School, University of Medicine and Dentistry of NJ, 675 Hoes Lane, Piscataway, NJ 08854, USA 

Received February 10, 2012; Revised April 3, 2012; Accepted April 12, 2012 



ABSTRACT 

We describe the RPF web server, a quality assess- 
ment tool for protein NMR structures. The RPF 
server measures the 'goodness-of-fit' of the 3D 
structure with NMR chemical shift and unassigned 
NOESY data, and calculates a discrimination power 
(DP) score, which estimates the differences 
between the fits of the query structures and 
random coil structures to these experimental data. 
The DP-score is an accuracy predictor of the query 
structure. The RPF server also maps local structure 
quality measures onto the 3D structure using an 
online molecular viewer, and onto the NMR 
spectra, allowing refinement of the structure and/ 
or NOESY peak list data. The RPF server is available 
at: http://nmr.cabm.rutgers.edu/rpf. 

INTRODUCTION 

Protein NMR spectroscopy provides infrastructure for 
research in biophysical chemistry. One of the challenges 
of protein structure determination by NMR is the lack of 
a broadly accepted R factor', comparing 3D structures 
with the raw, uninterpreted experimental data. Such R 
factors have been critical to the development of X-ray 
crystallography as a routine protein structure analysis 
method (1). Instead, NMR structures are generally 
validated against the derived experimental distance con- 
straint lists, which are an interpreted and incomplete rep- 
resentation of the data in the NOESY and other NMR 
spectra (2). 

RPF is an 'R-factor'-like protein structure validation 
tool, which assesses the completeness of experimental 
data and its agreement with the 3D structure (3). Because 
it is difficult to compare structures directly against raw 
experimental NMR spectral data, these analyses were per- 
formed with respect to minimally interpreted experimental 



data, i.e. NOESY spectra peak lists and resonance assign- 
ments. RPF also calculates a discriminating power (DP) 
score that estimates how well the query structure satisfies 
the data relative to a statistical random-coil structure. The 
DP-score ranges from 0 to 1 (3). 

The RPF protein structure quality assessment program 
has been used by the Northeast Structural Genomics 
(NESG) Consortium of the NIGMS Protein Structure 
Initiative over the last several years. It is a core component 
of the Protein Structure Validation Server (PSVS) 
analysis (4), and has been applied in the assessment and/ 
or refinement of more than 400 protein NMR structures. 
RPF has also been used as a key component of the recently 
developed CS-DP-Rosetta method (5), the GLM-RMSD 
accuracy prediction score (6) and the Critical Assessment 
of Automated Protein Structure Determination by NMR 
(CASD-NMR) project (7,8). 

Some commonly used knowledge-based protein struc- 
ture validation tools that assess the geometric and stereo- 
chemical quality of the structure include (i) Verify3D (9), 
(ii) Prosall scores (10), which evaluate the global fold like- 
lihood, (iii) PROCHECK scores (11), which assess the 
distribution of backbone and side-chain dihedral angles 
and the (iv) MolProbity clash score (12), which assesses 
the occurrence of high-energy interatomic contacts. We 
examined the correlations between these quality scores 
(including the DP score) and accuracy of the structures 
using 63 protein structure ensembles generated in the 
CASD-NMR2010 project (8). In this communication, we 
summarize data showing that, of these measures, only the 
DP-score has significant correlation with the accuracy of 
the protein structures. 

CALCULATING RPF/DP SCORES 

The algorithm to calculate RPF scores (i.e. Recall, 
Precision, F-measure) and the DP-score are described else- 
where (3). Briefly, Recall measures the percentage of input 
NOESY peaks that can be explained by the input query 
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structure(s) with a distance cut-off <5A. Precision 
measures the percentage of 'H-'H distances <5A 
calculated from the query structure that are observed in 
the NOESY data. F-measure combines the Recall and 
Precision scores, and estimates how well the input NMR 
structure ensemble fits with the input NMR data. DP 
score is a normalized score of F-measure, which estimates 
the significance of the F-measure score for the query struc- 
ture relative to what would be obtained for a random-coil 
structure fit to the same experimental data. 

The F-measure provides an assessment of the overall fit 
between a query model structure and the experimental 
data. Low F scores indicate that the query structure 
does not fit well with the data. A high-quality NMR struc- 
ture is expected to both (i) fit well to the NMR data (i.e. 
high F-measure score) and (ii) have enough long-range 
contacts to distinguish it from a freely rotating chain 
model (i.e. high DP scores). High F scores and low DP 
scores indicate that the NMR data does not have enough 
long-range information to distinguish the structures from 
a 'random coil' structure. 

Calculating the Precision score requires identifying all 
'H-'H distances <5 A from the o query structure. 
Identifying all 'H-'H distances <5A is a typical 3D 
range-searching problem in computational geometry 
(13). In the most recent version of RPF, we have imple- 
mented the k-D tree algorithm (14) to speed up the 'H-'H 
distance calculation time. Using the k-D tree, a set of n 'H 
can be preprocessed in 0(n log «) time into a data struc- 
ture of 0(«) size so that any 3-D range query can be 
answered in 0(n' 3 + k) time, where k is the number of 
answers reported (14). Without using the k-D tree 
algorithm, the query time will be 0(« 2 ) time, which 
becomes prohibitively time consuming for larger-size 
proteins or for studies involving the assessment of 
hundreds of decoys (5). 



THE RPF WEB SERVER 

The input files required for RPF are: (i) the atomic coord- 
inate files in PDB format, (ii) chemical shift data in 
BMRB format and (hi) NOESY peak lists in Sparky or 
Xeasy format. Examples of input data are provided on the 
home page of the web server. 

The RPF server reports the quality scores for each in- 
dividual conformer in the NMR ensemble, and for the 
structure ensemble as a whole (i.e. using the mid-range 
'H-'H distances of the ensemble). We observe that the 
RPF scores for the ensemble as a whole are generally 
higher (i.e. fit the NOESY peak list data better) than 
scores for the individual conformers. Accordingly, the 
ensemble is a better representation of the uncertainty 
and/or the dynamic conformational averaging effects 
that underlie the NOESY data. 

Precision Violations (i.e. false-positive interactions) are 
short 'H-'H distances in the query structures that are not 
supported by NOESY peak list data. The first result page 
displays the distribution of the Precision Violations on the 
query structure using the java viewer Jmol (15). The color 
is coded based on a heat index where red represents 



residues with extensive (many and/or large) Precision 
Violations, and blue represents residues with few or no 
Precision Violations (Figure 1A) (3). In Figure 1A, as an 
example, residues 29 and 32 are colored red, indicating 
that some of the very short 'H-'H distances observed in 
the query structure is not supported by the combined 
NOESY peak list and chemical shift data. Such 
Precision Violations generally arise from either inaccurate 
local structure or inaccurate resonance assignments, or the 
effects of NMR resonance broadening due to intermediate 
timescale conformational exchange (3). 

The 'Precision Violations' report summarizes all 'H-'H 
distances <5A in the query structures that are not sup- 
ported by the NOE peak list data. It is possible to use 
regular expressions to filter the list of Precision 
Violations. Figure IB illustrates an example using a 
regular expression search for Precision Violations 
involving residues 29 and 32 with max distance of 3.0 A. 
The 'Recall Violations' page (i.e. false-negative inter- 
actions) (Figure 1C) reports the resonance frequencies of 
peaks in the NOESY spectrum that, considering all 
possible assignments of the NOESY peak, are not consist- 
ent with the 3D query structure(s). These 'Precision 
Violations' and 'Recall Violations' are local quality score 
measures, which can be overlooked when looking at 
global RPF and DP scores. Precision Violations and 
Recall Violations provided in these reports can be 
mapped back to the 3D structure and NMR spectrum, 
respectively, providing guidance for further peak list 
and/or 3D structure refinement. This validation process 
is used extensively by NESG consortium NMR scientists 
in the final stages of protein structure refinement. 

The RPF web server provides a web-service for 
large-scale NMR structure quality assessments. The user 
can also save the RPF results locally and review them 
again on the website by uploading the file to the RPF 
web server. Sample data, including input files and 
results, are provided on the home page of the RPF web 
server. The Help Page of the RPF web server includes 
information on how to interpret the RPF results. 
Sample codes to access the RPF web-service are also 
provided on the Help Page. 



CORRELATION OF DP SCORE WITH THE 
ACCURACY OF PROTEIN NMR STRUCTURES 

The Critical Assessment of Automated Protein Structure 
Determination by NMR (CASD-NMR) (8) is an interna- 
tional NMR community project in which refined NOESY 
peak lists and resonance assignment lists are distributed 
while the manually refined NMR structure in held in con- 
fidence. Participants then carry out fully automated NMR 
structure analysis with these 'blind' data, which are sub- 
sequently compared with the manually refined NMR 
structure and/or X-ray crystal structures. In the 2010 
session of CASD-NMR (CASD-NMR2010), participants 
submitted 63 NMR protein structure ensembles for 10 
monomeric proteins, ranging in size from 60 to 150 
amino acid residues. (8). For each of these structure 
ensembles, we calculated the backbone RMSD to the 
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Figure 1. RPF output. (A) The distribution of the Precision Violations (a.k.a. false-positive interactions) mapped on the query structure based on a 
heat index. Red represents residues with strong Precision Violations and blue represents residues with few or no Precision Violations. In this example, 
residues 29 and 32 are colored red, indicating that several very short distances based on the input structure do not have corresponding NOE data in 
the NOESY peak list and/or one or more of the corresponding resonances are mis-assigned in the chemical shift list. (B) The 'Precision Violations' 
page reports all distances <5.0A calculated from the query structures that are not supported by the NOESY data. In this example, there are six 
Precision Violations involving residues 29 or 32 with max distance of 3.0 A. (C) The 'Recall Violations' page reports the input NOESY peaks that are 
not supported by the query structures within the average distance of 5.0 A. 



corresponding manually refined structure in the PDB. We 
refer to this measure of accuracy (assuming that the manu- 
ally refined reference structures are correct) as the RMSD 
bias. We also computed the global distance test total score 
(GDT_TS) (16), a structural similarity measure that does 
not require residue ranges to be pre-defined with RMSD 
calculations and is independent of protein size. The 
GDT_TS score has been developed as a local-global align- 
ment method for structure comparison, and has been 
extensively used for assessing the accuracy of protein 
structure predictions in CASP assessments (17). High 
structural similarity corresponds to low RMSD and high 
GDTTS values. 

The DP score from the RPF program, along with five 
additional geometric and stereochemical quality scores, 
were calculated for each of the submitted 63 CASD- 
NMR2010 structure ensembles using the Protein Struc- 
ture Validation Server (PSVS) (4). The five knowledge- 
based structure quality scores assessed included 
PROCHECK-$/v|/ score (11), the PROCHECK-A11 
dihedral score (11), the Molprobity clash score (12), the 
Verify3D fold score (9) and the Prosall fold score (10). 
Using these 63 protein structure ensembles, a significant 
correlation is observed between the DP-score and the struc- 
ture accuracy (Figure 2). However, no significant correl- 
ation is observed between any of the other five knowledge- 
based validation scores and the RMSD bias (Table 1). 

We define a structure as 'accurate' when the condition 
(i) backbone RMSD < 2.0 A or (ii) GDT_TS>80 is met. 



Table 2 summarizes the confusion matrix and metrics for 
accuracy prediction on the basis of the DP score. Very few 
false-positive and false-negative errors are found for the 
CASD-NMR20 1 0 structures (Table 2) (8). The range of 
DP-scores for the manually refined reference structures is 
0.79-0.90, except for one (AR3426A, 0.64; in this case, the 
NOESY data are unusually weak; many expected NOEs 
with very close distances have rather weak intensities or 
are missing from the spectra). A DP-score cut-off >0.7 
allowed the identification of acceptable accurate 
CASD-NMR2010 structures with a reliability of 95% 
(Table 2), based on the available NOESY peak lists. All 
structures with an RMSD to the reference >3.0A or a 
GDT TS score <60% had DP-scores lower than 0.6, 
except for a single instance. Based on these data, we 
conclude that a protein NMR structure will usually 
satisfy our definition of 'accurate' when its ensemble 
DP-score is >0.7. 



DISCUSSION 

RPF versus NOE restraint violation scores 

NOE restraint violation statistics measure the fitness of 
structure coordinates with the NOE-derived distance 
restraints. Several protein structure validation servers 
compute both restraint violation statistics and DP score 
[e.g. the PSVS server (4)]. A high quality structure tends to 
have high DP score and also low NOE restraint violations. 
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Figure 2. Correlation between accuracy measures (backbone RMSD to the reference structure and GDT_TS score) and the DP-score. The various 
thresholds mentioned in the text are highlighted by the continuous (RMSD < 2 A; GDT_TS > 80) and dashed (DP-score > 0.7) lines. These results 
demonstrate the discriminating power of the DP score in distinguishing accurate from less accurate protein NMR models. 



Table 1. Pearson's correlation coefficient between various accuracy and quality scores for the same data shown in Figure 2 
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Table 2. Confusion matrix and metrics for accuracy prediction on the 
basis of the DP-score 

Success 



Positive 



Negative 



DP-score prediction 
Positive 
Negative 

Metrics 

Sensitivity [TP/(TPH 
Specificity [TN/(TN 
Precision 6 [TP/(TP + 



44 (TP a ) 
4 (FN C ) 



FN)] 
t-FP)] 
FP)] 



2 (FP b ) 
13 (TN d ) 

0.917 
0.867 
0.957 



"True positives (TP) are accurate structures (i.e. 
RMSD < 2.0 A or GDT_TS > 80) that are correctly predicted to be 
accurate on the basis of their DP-score higher than the threshold 
(i.e. 0.7). 

b False positives (FP) are inaccurate structures that are erroneously pre- 
dicted to be accurate on the basis of their DP-score higher than the 
threshold. 

Talse negatives (FN) are accurate structures that are erroneously 
predicted to be inaccurate on the basis of their DP-score lower than 
the threshold. 

d True negatives (TN) are inaccurate structures that are correctly 
predicted to be inaccurate on the basis of their DP-score lower than 
the threshold. 

The precision (i.e. the ratio of true positives among all positive pre- 
dictions) becomes 1.00 at a DP-cut-off of 0.76. 



However, it is possible that an incorrect structure with low 
DP score can also have low restraint violations; e.g. the 
NOE restraints may have been incorrectly assigned or 
otherwise incorrectly derived from the NOESY data. 

Limitations for analyzing larger size proteins and 
homodimeric proteins 

For larger size proteins (e.g. >200 amino acids), it is often 
necessary to use perdeuterated samples for structure 
determination. The RPF program can handle validation 
of protein structures using data from such perdeuterated 
protein samples, by excluding the deuterated atoms from 
the chemical shift assignment table. The computed RPF 
score provides useful measures of how good the data fits 
with the structure. However, the correlation between the 
RPF/DP scores and the structure accuracy is not as high 
as with fully protonated proteins, because data from 
perdeuterated proteins is much sparser. In particular, 
close H-H contacts, which may be critical to distinguish 
the correct from incorrect fold, are less extensive in the 
perdeuterated data set, making the DP score less sensitive 
to the structure accuracy. We suspect that an accurate 
structure will require a higher DP score cut-off using 
data from a perdeuterated protein compared to using 
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full-protonated protein data. Additional test data sets are 
needed to assess the best way to use the DP score for data 
obtained on perdeuterated proteins. 

The RPF program can also analyze homodimeric 
proteins. This requires the user to first combine the two 
identical chains into a single chain with a different residue 
index. The RPF/DP score may be less sensitive for highly 
degenerate homodimeric proteins if many intermolecular 
NOEs, which may be critical to define the correct inter- 
molecular packing, are degenerate with intramolecular 
NOEs. We also suspect that an accurate highly degenerate 
homodimeric protein structure will require a higher DP 
score cut-off than a protein structure with less degenerate 
resonance frequencies. 

CONCLUSIONS 

The RPF scores measure the fitness of NOES Y peak list and 
resonance assignment data with NMR structure models. 
RPF scores, particularly the DP score, have a strong correl- 
ation with structure accuracy. Although other structure 
quality assessment tools [e.g. PROCHECK-all (11) and 
Molprobity (12)] do not correlate well with the structure 
accuracy based on the CASD-NMR2010 data, these 
knowledge-based assessments are none the less, very im- 
portant tools for protein structure validation. Such 
knowledge-based methods compare observed conform- 
ational distributions and packing interactions with values 
observed in nature and/or expected on first principles. In 
general, an accurate NMR-derived protein structure should 
score well in all of these different and complementary views 
of structure quality (3). 

High RPF scores and high PROCHECK and 
Molprobity scores indicate that a structure both fits the 
data well and has good stereochemical qualities. This is a 
goal of the structure determination process. High RPF 
scores and slightly lower PROCHECK and Molprobity 
scores indicate that a structure fits the data well, but 
that the data may not be sufficient to correctly define 
local conformations. In this case, additional data and/ 
or refinement may be required. However, good 
PROCHECK, Molprobity and other knowledge-based 
scores may be obtained for inaccurate structures which 
do not in fact fit well to the NMR data (8). Provided 
that the quality of input NOESY data is high, such struc- 
tures would have poor RPF scores, and particularly DP 
score <0.6. The RPF server provides an effective and con- 
vention tool for evaluating and validating protein struc- 
tures derived from NOESY data. 
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